The nespole! voIP dialogue database
نویسندگان
چکیده
This paper presents the status of the NESPOLE! data collection as of end of February, 2001. A multilingual VoIP (Voice over Internet Protocol networks) database consisting of 200 dialogues in 4 languages (English, German, Italian and French) was recorded and transcribed. Dialogue speakers were connected via a H323 video-conferencing terminal. We describe the task, the technical architecture, the recording procedure and the transcription process of the NESPOLE! data collection. We provide some statistics concerning the data and, finally, we address problems that arose during the collection and annotation process.
منابع مشابه
From generic to task-oriented speech recognition : French experience in the NESPOLE! European project
This paper presents CLIPS laboratory activities in speech recognition related to language model adaptation and acoustic model adaptation in the NESPOLE! European project. ASR system needed to be adapted in two ways. The language model had to deal with task specific vocabulary and the acoustic model had to be robust to VoIP (Voice over IP) speech. It was shown that Internet, as a very large sour...
متن کاملBalancing Expressiveness and Simplicity in an Interlingua for Task Based Dialogue
In this paper we compare two interlingua representations for speech translation. The basis of this paper is a distributional analysis of the C-star II and Nespole databases tagged with interlingua representations. The C-star II database has been partially re-tagged with the Nespole interlingua, which enables us to make comparisons on the same data with two types of interlinguas and on two types...
متن کاملFrom generic to task-oriented speech recognition :
This paper presents CLIPS laboratory activities in speech recognition related to language model adaptation and acoustic model adaptation in the NESPOLE! European project. ASR system needed to be adapted in two ways. The language model had to deal with task specific vocabulary and the acoustic model had to be robust to VoIP (Voice over IP) speech. It was shown that Internet, as a very large sour...
متن کاملThe NESPOLE! voIP multilingual corpora in tourism and medical domains
In this paper we present the multilingual VoIP (Voice over Internet Protocol networks) corpora collected for the second showcase of the Nespole! project in the tourism and medical domains. The corpora comprise over 20 hours of human-tohuman monolingual dialogues in English, French, German and Italian: 66 dialogues in the tourism domain and 49 in the medical domain. We describe in detail the dat...
متن کاملHLT modules scalability within the NESPOLE! project
The spoken dialogue translation project NESPOLE! proposed two showcases in order to focus on two important issues: scalability-namely, the capability of a system to progressively handle larger portions of a given domain and cross-domain portability. Those concerns were rather new when the project was proposed. ShowCase-1 dealt with limited tourism, while ShowCase-2 consisted of ShowCase-2a on e...
متن کامل